Heterogeneous  Multiscale  Methods  Applied  to  Stiff  Problems 

with  Varying  Scales 


Professor  Olof  Runborg 


NADA  CSC  KTH 
Lindstedtsv  3 

Stockholm,  Sweden  100  44 


EOARD  Grant  07-3081 


Report  Date:  July  2013 

Final  Report  from  1  June  2007  to  31  May  2012 


Distribution  Statement  A:  Approved  for  public  release  distribution  is  unlimited. 


Air  Force  Research  Laboratory 
Air  Force  Office  of  Scientific  Research 
European  Office  of  Aerospace  Research  and  Development 
Unit  4515  Box  14,  APO  AE  09421 


REPORT  DOCUMENTATION  PAGE 

Form  Approved  OMB  No.  0704-0188 

Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  the  burden,  to  Department  of  Defense,  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports  (0704-0188),  1215  Jefferson 
Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any  penalty  for  failing  to  comply 
with  a  collection  of  information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 

1.  REPORT  DATE  (DD-MM-YYYY)  2.  REPORT  TYPE 

15  July  2013  Final  Report 

3.  DATES  COVERED  (From  -  To) 

1  June  2007 -31  May  2012 

4.  TITLE  AND  SUBTITLE 

Heterogeneous  Multiscale  Methods  Applied  to  Stiff  Problems 
with  Varying  Scales 

5a.  CONTRACT  NUMBER 

FA8655-07-1  -3081 

5b.  GRANT  NUMBER 

Grant  07-3081 

5c.  PROGRAM  ELEMENT  NUMBER 

61102F 

6.  AUTHOR(S) 

Professor  Olof  Runborg 

5d.  PROJECT  NUMBER 

5d.  TASK  NUMBER 

5e.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

NADA  CSC  KTH 

Lindstedtsv  3 

Stockholm,  Sweden  100  44 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

N/A 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

EOARD 

Unit  4515 

APO  AE  09421-4515 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

AFRL/AFOSR/IOE  (EOARD) 

11.  SPONSOR/MONITOR’S  REPORT  NUMBER(S) 

AFRL-AFOSR-UK-TR-201 3-0026 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Distribution  A:  Approved  for  public  release;  distribution  is  unlimited. 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

This  work  explores  fast  numerical  methods  for  solving  rate  equations  that  describe  the  population  densities  of  chemical  species 
or  atomic  states.  The  rate  equations  are  very  stiff  nonlinear  ordinary  differential  equations,  with  essentially  one  slow  time  scale 
and  a  large  range  of  fast  scales.  We  consider  implicit  multistep  and  one-step  methods.  They  require  the  solution  of  a  nonlinear 
system  of  equations  in  each  time  step  with  a  Newton  method.  To  reduce  the  cost  of  this,  we  use  approximations  or 
prefactorization  of  the  Jacobian  matrix.  Different  approximation  strategies  are  explored.  The  importance  of  exact  discrete 
conservation  is  highlighted,  leading  to  an  approach  where  the  Jacobian  is  truncated  to  banded  form  and  remaining  off-diagonal 
elements  are  adjusted  by  a  weight  that  depends  on  the  elements  in  the  full  Jacobian.  The  prefactorization  approach  uses  a  QZ 
decomposition  of  the  leading  part  of  the  Jacobian,  and  a  separate  treatment  of  a  rank  one  part.  Numerical  experiments  indicate 
that  these  methods  give  accurate  results  at  a  low  computational  cost. 

15.  SUBJECT  TERMS 

EOARD,  Mathematical  Modeling,  Turbulence 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18,  NUMBER 
OF  PAGES 

19a.  NAME  OF  RESPONSIBLE  PERSON 

Kevin  Bollino 

a.  REPORT 

UNCLAS 

b.  ABSTRACT 

UNCLAS 

c.  THIS  PAGE 

UNCLAS 

SAR 

31 

19b.  TELEPHONE  NUMBER  (Include  area  code) 

+44(0)1895  616163 

Standard  Form  298  (Rev.  8/98) 
Prescribed  by  ANSI  Std.  Z39-18 


Heterogeneous  Multiscale  Methods  Applied  to 
Stiff  Problems  with  Varying  Scales 

Award  no:  FA8655-07- 13081 

Final  Report 

Period  of  performance:  1  June  2007  -  31  May  2013 


Olof  Runborg 

Report  date:  July  15,  2013 


Abstract 

We  develop  fast  numerical  methods  for  solving  rate  equations  that  de¬ 
scribe  the  population  densities  of  chemical  species  or  atomic  states.  The 
rate  equations  are  very  stiff  nonlinear  ordinary  differential  equations,  with 
essentially  one  slow  time  scale  and  a  large  range  of  fast  scales.  We  consider 
implicit  multistep  and  one- step  methods.  They  require  the  solution  of  a 
nonlinear  system  of  equations  in  each  time  step  with  a  Newton  method. 
To  reduce  the  cost  of  this,  we  use  approximations  or  prefactorization  of 
the  Jacobian  matrix.  Different  approximation  strategies  are  explored. 
The  importance  of  exact  discrete  conservation  is  highlighted,  leading  to 
an  approach  where  the  Jacobian  is  truncated  to  banded  form  and  re¬ 
maining  off-diagonal  elements  are  adjusted  by  a  weight  that  depends  on 
the  elements  in  the  full  Jacobian.  The  prefactorization  approach  uses  a 
QZ  decomposition  of  the  leading  part  of  the  Jacobian,  and  a  separate 
treatment  of  a  rank  one  part.  Numerical  experiments  indicate  that  these 
methods  give  accurate  results  at  a  low  computational  cost. 
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1  Background 

This  project  started  out  with  Pis  Prof.  Heinz-Otto  Kreiss  and  Dr.  Jon  Tegner 
at  KTH  Royal  Institute  of  Technology.  In  June  2011  it  was  transferred  to  Prof. 
Olof  Runborg,  also  KTH.  On  the  Air  Force  side,  Dr.  Jean-Luc  Cambier  has 
been  the  main  investigator. 


2  Introduction 

The  main  objective  of  this  project  is  to  investigate  fast  numerical  methods 
for  solving  ordinary  differential  equations  (ODEs)  with  strongly  varying  time 
scales.  The  application  of  the  methods  is  in  the  solution  of  rate  equations  for 
the  population  densities  of  chemical  species  or  atomic  states.  This  in  turn  is 
a  key  part  in  the  simulation  of  complex  reactive  fluid  dynamics,  where  a  huge 
number  of  instances  of  the  rate  equations  need  to  be  solved. 

For  a  system  with  N  atomic  states  the  ODEs  are  of  the  form 

^  =  (Mo  +  yMi  +  y2M2)X ,  y  =  zTX,  Mj  e  1™^, 

where  z  is  a  fixed  vector  and  X  E  MAr+1  contains  the  population  densities  of 
the  atomic  states  and  ions. 

The  rate  equations  are  very  stiff  ODEs  with  essentially  one  slow  time  scale 
and  many  fast  scales.  The  gap  between  them  is  large,  but  the  range  of  the  fast 
scales  is  also  large;  see  the  example  in  Figure  2,  where  the  a  typical  solution 
and  the  eigenvalues  of  the  Jacobian  of  the  right  hand  side  are  plotted.  This  fact 
makes  these  ODEs  very  difficult  to  solve  numerically. 

The  huge  stiffness  ratio  of  the  rate  equations  precludes  the  use  of  standard 
explicit  time- stepping  methods.  It  implies  a  severe  stability  requirement  that 
leads  to  a  complexity  of  0(N2X\arge)  for  a  system  of  N  states,  where  Aiarge  is  the 
largest  eigenvalue  of  the  Jacobian.  The  alternative  is  to  use  implicit  methods, 
which  have  no  stability  requirements  and  give  better  complexity.  One  can  choose 
the  time  step  based  on  the  smallest  eigenvalue  Asman  but  a  nonlinear  system  of 
equation  must  be  solved  in  each  time  step.  Since  the  matrices  are  full,  the 
dependence  on  N  in  the  complexity  is  therefore  worse;  the  cost  is  0(N3Xsma\\). 
A  third  possible  approach  would  be  explicit  “multiscale”  time-stepping  methods, 
such  as  Chebyshev  methods  [1,  7],  heterogeneous  multiscale  methods  (HMM) 
[3],  projective  integration  methods  [4,  5]  and  flow  averaging  methods  [9].  These 
have  much  lower  complexity  than  standard  explicit  methods,  but  require  a  clear 
separation  between  the  bulk  of  the  scales  and  a  few  fast  scales.  This  is  not  the 
scale  structure  of  the  rate  equations,  however,  which  makes  the  methods  less 
suitable. 

Instead  we  have  explored  implicit  multistep  methods  where  the  nonlinear 
system  of  equations  is  solved  using  approximations  or  prefactorization  of  the 
Jacobian  matrix.  The  methods  then  becomes  less  expensive,  with  a  complexity 
of  only  0(N2  Asman).  The  approximation  is  based  on  the  fact  that  the  Jacobian  is 
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strongly  diagonally  dominant  and  can  be  well  approximated  by  a  banded  matrix. 
It  must  be  done  carefully,  however.  In  particular  we  find  that  the  discrete 
conservation  of  the  solver  must  be  upheld  exactly  also  for  the  approximated 
solver.  Prefactorization  cannot  be  done  by  a  simple  LU  decomposition  since  the 
ODE  is  nonlinear,  but  by  treating  a  rank  one  part  of  the  Jacobian  separately, 
we  only  need  to  consider  a  system  of  the  type  ( A  +  yB)X  =  b  for  fixed  A,  B 
but  varying  y.  This  can  be  solved  fast  for  any  y  if  we  precompute  the  QZ 
factorization  of  A  and  B. 

We  also  consider  one  step  implicit  Runge-Kutta  methods  for  the  equations. 
Approximation  and  prefactorization  cannot  be  used  in  the  same  straightforward 
way  in  these  methods,  however. 

This  report  is  organized  as  follows.  In  Section  3  the  physical  model  is 
explained  and  the  governing  equations  are  derived.  Some  properties  of  the 
equations  and  their  consequences  for  numerical  approximation  are  presented  in 
Section  4.  Implicit  methods  with  an  approximate  Jacobian  are  discussed  in  Sec¬ 
tion  5,  while  the  precomputation  strategy  using  Qzv -factorization  is  described 
in  Section  6.  In  Section  7  one-step  methods  are  explored.  The  report  concludes 
with  Section  8  where  a  summary  of  the  results  is  given  together  with  an  outlook 
on  future  challenges. 

3  Physical  Model  and  Assumptions 

We  consider  a  simple  atomic  system  (atomic  hydrogen)  with  N  electronic  states, 
with  a  population  number  density  xn  ( n  =  0 . . . TV  —  1).  The  ionized  state 
has  a  number  density  x+  and  the  density  of  free  electrons  is  xe.  By  charge 
conservation,  we  have: 

Xe  ^  ^  Zq%q  (1) 

q 

where  the  summation  runs  over  all  atomic  states;  thus,  zq  =  0  for  q  =  0, . . .  N— 1, 
since  the  bound  electronic  states  of  H  are  neutral,  and  z+  =  1  since  the  hydrogen 
ion  has  a  unit  charge.  Charge  conservation  thus  allows  us  to  express  one  variable 
(xe)  in  terms  of  the  atomic  states  via  (1).  Using  chemical  element  conservation, 
it  is  also  possible  to  eliminate  one  other  variable  (e.g.  x+)\  for  a  total  initial 
number  of  atoms  TV#-,  we  have  J2qxq  =  Nh  =  constant,  leaving  only  N  —  1 
independent  variables.  However,  it  is  often  preferable  to  keep  all  variables  in 
the  system  of  equations,  with  the  understanding  that  some  eigenvalues  may  be 
zero,  expressing  the  conservation  properties  of  the  collisional-radiative  kinetic 
equations. 

The  complete  set  of  number  densities  of  electronic  levels  of  both  neutral 
and  ionized  atoms  form  an  Atomic  State  Density  Function  (ASDF)  which  can 
be  expressed  in  the  form  of  a  vector  X  =  {xq,q  =  0,  ...TV}.  The  rates  of 
change  of  these  population  densities  are  given  by  three  physical  processes:  a) 
collisional  bound-bound  transitions,  i.e.  excitations  and  de-excitations;  b)  colli- 
sional  bound-free,  i.e.  ionization  and  recombination;  c)  radiative  bound-bound. 
For  the  latter,  we  consider  radiative  deexcitations  only,  i.e.  spontaneous  decay 
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of  the  excited  states;  the  reverse  process  of  radiation  absorption  and  electronic 
excitation  is  neglected.  Note  that  we  also  have  ignored  radiative  bound-free 
transitions,  i.e.  radiative  capture  and  photo-ionization.  We  also  consider  only 
electron-impact  collisions.  Most  of  these  assumptions  and  simplifications  will 
be  relaxed  in  future  work. 

Let  us  first  consider  a  bound-bound  transition,  for  which  the  rate  of  change 
of  the  population  density  for  level  n  is  of  the  form: 

dxn  t  n  l 

~a(m\n)XnXe  +  P(n\m)XrnXe  (^) 

The  first  term  on  the  right  (2)  describes  the  loss  due  to  excitation  (|)  from  level 
n  to  m,  as  a  result  of  collisions  between  free  electrons  (of  number  density  xe) 
and  existing  states  (number  density  xn)\  the  second  term  describes  the  gain 
due  to  collisional  deexcitation  (j)  induced  by  free  electrons  (xe),  from  the  state 
m,  with  number  density  xm.  Hereafter,  we  will  denote  the  indices  on  the  rates 
such  that  the  left-most  index  (/ 1  is  the  final  state  and  the  right  index  | i)  is 
the  initial  state.  The  second  term  (deexcitation)  is  the  reverse  process  of  the 
former;  if  there  were  only  two  states  to  consider,  this  would  be  the  entire  rate 
of  change  for  level  n.  All  transitions  involving  the  state  n  must  be  counted,  so 
the  rate  of  change  for  excitation  and  deexcitation  alone  requires  us  to  sum-up 
the  right  hand  side  of  equation  (2)  over  all  levels  m  ^  n.  Note  that  for  the  same 
transition  between  the  levels  n  and  m,  we  also  have: 
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density  of  a  level  n : 

=  ~  a\m\n)XeXn  +  P(n\m)XeXm  + 

rn>n  m>n  m>n 

+  a(n|m)XeXm  ^m|n)XeXn  _  (^a) 

m<n  m<n  rn<n 

-  a\+[n)xexn  +  +)x+x2e 

This  system  can  be  written  in  matrix  form: 

^  =  (M0  +  xeM\  +  x\M2)  •  X  (8) 

The  right  side  is  non-linear  {pce  is  a  function  of  X )  but  note  that  the  first  entry 
on  the  right  is  due  only  to  the  radiative  rates  Anrn ,  the  only  purely  linear 
contribution  to  the  source  term,  while  the  last  term  has  the  highest  degree  of 
non-linearity,  but  is  due  entirely  to  the  recombination  /3^n which  is  a 
relatively  slow  source  term. 


3.1  Kinetic  Rates 


It  is  worth  describing  here  the  expressions  for  the  kinetic  rates,  in  order  to  gain 
some  insight  into  the  structure  of  the  spectrum  of  eigenvalues  of  the  system  (7). 
Using  classical  collision  theory  and  the  Bohr  model  for  the  hydrogen  atom,  the 
excitation  rates  are  of  the  form  [10]: 


^(ra|n)  —  (/^7Tao)Ve  (  ^  J  {^fnm)^nm 


where  ao  is  the  Bohr  radius,  Ih— 13.6  eV  is  the  Rydberg  constant, 

32  1  1  1 
Urn  SttVSti* 

V  nz  mz  ) 

is  the  oscillator  strength  of  the  transition  n  —  m, 


SkTe 


7r  me 


1/2 


is  the  mean  thermal  electron  velocity  and 

p  Cnm 

'Ipnrri  — 


E1  (Cnm  ) 


(9) 


(10) 


(11) 


(12) 


where  £ram  =  Enm/kT  with  Enm  =  Inil/n2  —  1/m2)  the  energy  gap  between 
levels  and  E1  is  the  exponential  integral: 


l)  =  I 


°o  £-b 


Ei(a)  =  /  —  db 


(13) 
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At  equilibrium  (“Boltzmann”),  the  ratio  of  population  densities  is: 


_  v? 

—  On 
X* 


i(Te)  =  9-^e~E" 

9n 


t/kTe 


(14) 


where  gn  is  the  degeneracy  of  state  n.  The  rate  of  change  is  null  at  this  equi¬ 
librium  condition  and  therefore 


B\  ,  ,  =  — e+?"m  .  aJ  (15) 

^  (n\m)  n  (ran)  v  / 

ym 

The  formulation  of  the  rate  involves  an  exponential  integral  with  no  simple 
expression;  at  low  temperature  (£nm  1),  we  can  use  the  approximation: 


Ei(a)  ~ 


(16) 


This  is  not  always  the  case  if  we  consider  a  large  number  of  states,  as  the 
energy  gaps  Enrn  decrease  and  eventually  we  have  £nm  ~  1,  in  which  case  a 
better  approximation  would  be: 


O  p  Cnm 

-  (17) 

3  s  nra 

Nevertheless,  we  will  restrict  ourselves  to  the  approximation  (16)  only,  in  which 
case: 


and 


The  factor  in  brackets  is  a  scale  of  the  rate  of  change  ( dx/dt )  (an  upper  bound); 
in  the  limit  £nm  — ►  0  i.e.  for  the  upper  states,  the  rates  approach  that  value. 
Another  scale  is  the  factor  Ijj/kT  in  the  definition  of  £nm,  which  will  describe 
how  stiff  the  system  is.  If  that  factor  is  very  low  (high  temperatures),  all  rates 
are  of  the  same  order;  at  low  temperatures,  the  exponential  term  dominates 
and  the  range  of  time  scales  is  increased.  Note  also  that  the  system  is  strongly 
diagonally  dominant ,  in  the  sense  that  transitions  with  small  change  in  quantum 
number  (m  — n~l)  have  a  higher  rate  than  those  with  m  —  n  1.  In  fact,  a 
fairly  good  approximation  may  be  to  consider  a  ladder  process,  i.e  transitions 
between  neighboring  states  only;  this  approximation  may  not  always  be  valid 
however,  when  other  atoms  besides  Hydrogen  are  considered,  and  this  will  be 
studied  further  as  we  extend  the  numerical  integration  schemes  to  more  complex 
systems. 

The  ionization  rate  coefficient  is 

«(+!„)  =  (4?r  a20)ve  (j^fj  V’(^n)  (20) 


a 


(m\n) 


'  2  32  _ 

4^0 - 7^  •  Ve 

7tV3 


/3,x 


(n|m) 


47T<2n 


n5m3(n  2  —  m  2)5 

1 

37T75  ( r?— 2  —  m~  2^5 


32 


y/S  \  n3m5(n  2  —  m  2) 


(18) 

(19) 
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where  £n  =  In/kT  and  In  is  the  ionization  potential  from  level  n.  The  equi¬ 
librium  for  ionization  and  recombination  (the  “Saha”  equilibrium)  involves  a 
different  relation: 


/  x+xe 
V  xn 


=  Sn(Te)  =  — 

9n 


(21) 


where  g+  is  the  degeneracy  of  the  ion  ground  state  (for  atomic  hydrogen, 
g+  =  2).  The  factor  indentified  as  Ze  is  the  partition  function  of  the  free 
electrons.  We  cannot  generally  assume  that  the  equilibrium  values  are  the  same 
for  both  bound-bound  and  bound- free  processes.  Usually  we  can  have  Boltz¬ 
mann  equilibrium  (14)  without  Saha  equilibrium,  but  hardly  the  reverse,  mostly 
because  it  takes  more  energy  to  ionize  than  to  excite;  for  the  upper  states  close 
to  the  ionization  limit  (n  1),  the  difference  is  less  significant.  Using  the 
principle  of  detailed  balance,  the  reverse  (recombination)  rate  is: 


4rn|  +  ) 


"4  a^h3 
7 r  m2kTe 


2 

«V(£n)e€’ 


(22) 


Finally,  the  spontaneous  emission  rates  from  an  upper  level  m  (in  sec  x) 


\n\m) 


8ttV\  gn 


mec°  J  grn 


trri  o 

Jn 


(23) 


For  hydrogen,  this  is: 


A 


1.6  1010 


(n\m) 


m3n(m2  —  n2) 


sec 


-l 


(24) 


4  Properties  of  the  Equations  and  Numerical 
Considerations 

As  shown  in  (8)  the  simplified  kinetics  for  a  system  with  N  atomic  states  for  a 
system  of  ODEs  of  the  form 

fl  =  (Mo  +  yM1  +  y2M2)X,  y  =  zTX,  (25) 

where  X  =  (xo, . . . ,  x^)  £  MAr+1  represents  the  population  densities  of  the 
various  states  —  xq  is  the  density  of  the  ground  atomic  state,  x%, , . .  xn-i  are 
the  densities  of  excited  states  and  xn  =  £+  is  the  density  of  ions  (we  have 
written  y  =  xe  to  express  a  general  conservation  property).  Moreover,  z  is 
the  fixed  vector  z  =  (0,  ...,0, 1)T  £  MiV+1  so  that  y  =  xn.  The  matrices 
Mj  £  M(^+!)x(iV+ i)  are  all  fixed  with  the  following  properties: 

•  M0  is  upper  triangular.  The  first  and  last  columns  are  zero.  It  has  a  fast 
decay  off  the  diagonal  and  the  column  sums  are  all  zero. 
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•  Mi  is  a  full  matrix,  except  that  the  last  column  is  zero.  The  submatrix 
(1:7V  —  1,1:7V  —  1)  has  fast  decay  off  the  diagonal.  The  column  sums 
are  all  zero. 

•  M2  is  a  rank  one  matrix.  It  is  zero,  except  for  the  last  column,  which  has 
sum  zero. 

See  Figure  1  for  examples.  These  properties  follow  from  the  physical  model 
described  in  the  previous  section.  In  particular,  the  zero  column  sums  imply 
the  natural  requirement  that  the  total  number  of  electrons  is  conserved, 

N 

Q(t)  :=  5>(t)  =  constant. 

#=o 

This  is  easily  seen  by  left  multiplication  of  (25)  by  1  =  (1, . . . ,  1)T  G  MAr+1, 

f  =  t  f  =  slT  ■ x  =  lT<M» +  »"■ + smx  =  0, 

3=0 

since  the  column  sums  1 T Mj  are  zero. 

4.1  Stiffness  and  Time  Scales 

The  Jacobian  of  the  right  hand  side  in  (25)  is  given  by 

J(X)  =  M0  +  yMi  +  y2M2  +  M1XzT  +  2yM2XzT.  (26) 

The  eigenvalues  of  J(X)  determine  the  time  scales  of  the  system.  Figure  2 
shows  a  typical  example  of  a  solution  X  and  the  eigenvalues  of  J(X)  as  a 
function  of  time  when  TV  =  10.  The  coarse  behavior  of  the  system  has  the 
timescale  given  by  the  smallest  (non-zero)  eigenvalue  Asman  ~  102,  while  the 
fastest  timescales  in  the  system  are  given  by  the  largest  eigenvalues  Aiarge  ~  109, 
initially.  The  large  eigenvalues  also  change  over  many  magnitudes  within  the 
relevant  computational  time.  Hence,  the  system  is  very  stiff,  with  a  stiffness 
ratio  Aiarge/Asmaii  between  106  and  109  when  TV  =  10.  Furthermore,  in  Figure  3 
one  can  see  that  Aiarge  grows  with  TV,  showing  that  the  stiffness  gets  worse  for 
larger  system  sizes  TV. 

4.2  Standard  Explicit  and  Implicit  Methods 

Standard  explicit  time-stepping  methods  cannot  be  used  in  these  extreme  con¬ 
ditions.  The  stability  requirement  would  demand  a  time  step  of  the  order  of 
V Aiarge-  Since  the  matrices  Mj  are  full,  each  step  would  have  a  computational 
cost  of  0(TV2),  giving  the  total  complexity 

complexity  of  explicit  methods  =  0(TV2Aiarge)- 
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MO 


Ml 


Figure  1:  Log  plots  of  element  sizes  for  Mq  and  M\  when  TV  =  50. 
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Solution 


t 


(a) 


(b) 


Figure  2:  Rate  equations  with  10  energy  states,  a)  Population  densities  as  a 
function  of  time;  b)  Eigenvalues  A j  of  Jacobian  as  a  function  of  time.  Note 
the  huge  span  between  largest  and  smallest  eigenvalue,  and  the  great  changes 
of  magnitudes  of  the  eigenvalues  over  time.  (The  noisy  data  for  the  smallest 
eigenvalue  is  just  numerical  errors;  the  exact  Jacobian  has  a  zero  eigenvalue  for 
all  N  since  its  column  sum  is  zero,  see  below.) 
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t 


Figure  3:  Eigenvalues  A j  as  a  function  of  time  for  larger  systems:  N  =  20  (a) 
and  N  =  50  (b). 
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Implicit  methods  have  no  stability  requirements  and  can  choose  the  time 
step  based  on  accuracy  needs  only,  i.e.  proportional  to  1/Asman  such  that  it  just 
resolves  the  slow  time  scale.  On  the  other  hand,  a  nonlinear  equation  needs  to 
be  solved  in  each  time  step,  which  in  the  end  amounts  to  inverting  I + a  J(X)  for 
some  constant  a.  This  is  a  full  matrix  since  M\  is  full,  giving  a  cost  of  0(N3). 
Thus 

complexity  of  implicit  methods  =  0(7V3Asman), 

which  is  typically  much  smaller  than  for  explicit  methods,  but  can  still  be  too 
high  because  of  the  cubic  dependence  on  N. 

This  is  thus  a  particularly  difficult  type  of  ODE  and  solving  it  fast  and  at 
low  computational  cost  presents  a  great  numerical  challenge. 

4.3  Multiscale  Explicit  Methods 

One  approach  would  be  to  find  non-standard  explicit  time-stepping  methods 
to  solve  the  rate  equations  efficiently,  based  on  new  multiscale  approaches.  In 
the  past  ten  years  there  has  been  a  renewed  interest  in  developing  such  explicit 
methods  for  stiff  ODE  problems,  for  instance  Chebyshev  methods  [1,  7],  hetero¬ 
geneous  multiscale  methods  (HMM)  [3],  projective  integration  methods  [4,  5], 
and  flow  averaging  [9].  Although  there  is  a  clear  gap  between  the  slowest  time 
scale  and  the  next,  beyond  that  there  are  many  fast  scales  spread  out  over  a 
large  interval.  This  situation  is  not  handled  well  by  the  multiscale  methods 
mentioned  above. 


5  Implicit  Methods  with  Approximate  Jacobian 

Implicit  methods  can  be  made  more  efficient  if  an  approximation  of  the  Jacobian 
J(X)  is  employed  such  that  I+aJ(X )  can  be  inverted  rapidly.  For  this  approach 
we  consider  the  BDF  methods,  which  are  standard  implicit  multistep  methods. 
The  first  three  in  this  family  reads 

X(n+1)  =  X(n)  +  A  tF(Xin+1)) 

JV"+1)  =  ^ A(n)  -  ijV""1)  +  tF(X(-n+1)), 

A("+1)  =  +  A("-2>  +  ^A tF(X(n+1)), 

where  V”)  ss  X(nAt). 

In  order  to  solve  the  nonlinear  system  in  each  time  step  one  iteration  of  the 
Newton  method  (with  initial  guess  X^n')  is  employed,  which  is  equivalent  to 
replacing  F(X )  with  a  local  linearization,  namely 

F(X(n+1))  =>  F(X(n))  +  J(X(ra))(X(n+1)  -X(n)).  (28) 


(27a) 

(27b) 

(27c) 
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This  reduces  the  problem  to  solving  a  linear  system  in  each  time  step,  more 
precisely 


[I  -  At  J(n)]X(n+1)  =  X(n)  +  A f[F(X(n))  -  J(n)X(ra)]  (29a) 

[I-  -A fJ(n)]X(n+1)  =  -X(n)  -  -X(n_1) 

3  3  3 

+  -A f[F(X(n))  -  J(n)X(n)]  (29b) 

o 

[I  -  ^AtJ(n)]X(n+1)  =  ^X(n)  -  -jW(n_1)  +  A x{n~ 2) 

+  ^At[F(X(n))  -  J(n)X(n)]  (29c) 

The  main  cost  in  each  iteration  is  thus  the  0(N 3)  cost  to  invert  I  +  aJ(X (n)). 
The  next  largest  cost  is  the  evaluation  of  F(X^)  and  J(X (n))  •  X^  which  are 
both  0(N 2)  operations  since  M0  and  Mi  are  full  matrices.  The  idea  here  is  to 
approximate  J(X)  «  J(X)  such  that  /  +  aJ(X)  can  be  inverted  with  at  most 
0(N2)  operations.  That  would  give 

complexity  of  implicit  method  with  approximate  J(X)  =  0(A/'2Asman), 

which  is  of  course  much  faster  than  standard  implicit  methods.  We  will  now 
discuss  different  strategies  for  this. 


Remark:  To  be  more  accurate  we  can  make  several  iterations  in  the  Newton 
method.  Then  the  above  equations  are  solved  multiple  time.  For  instance,  if  we 
use  K  iterations  in  the  BDF1  method  we  would  have 

[I-AtJM]Yk+1  =  XW+A  t[F(Yk)-J(Yk)Yk\,  Y0  =  X ("+1>  =  YK. 


5.1  Sherman— Morrison  Formula  for  the  Tail  of  J(X) 

We  split  the  expression  for  the  Jacobian  as 

J{X)  =  M0+yM1+R(X),  R(X)=y2M2  +  M1XzT +  2yM2XzT,  (30) 


We  can  write  M2  =  in:/1  with  m  G  RiV+1  being  the  last  column  of  M2.  There- 
fore, 

R(X)  =  (3 y2m  +  M^z7,  :=  m(X)zT ,  (31) 

is  a  rank  one  matrix.  We  now  note  that  for  any  A  we  can  invert  A  +  R  at  the 
same  cost  as  A  by  using  the  Sherman-Morrison  formula, 


(A  +  uv7)-1  = 


A  1uvTA  1 
1  +  vTA~1u 


(32) 


Indeed,  in  order  to  solve 


(A  +  R)X  =  b, 
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we  could  first  solve  As  =  rh  and  Aw  =  b.  Then,  by  (32),  the  solution  X  is  given 


as 


/  4  n\ _ 1  7  2  W 

x  =  (A  +  R)  b  =  w - t^s  =  w 

1  +  z1  s 


WN 

1  I 

1  +  S]V 


which  is  just  an  0(N )  cost  once  w  and  s  have  been  computed.  The  conclusion 
is  that  we  only  need  to  find  a  fast,  approximate,  way  to  invert  the  leading  part 
of  I  +  aJ,  namely  I  +  a(M0  +  yMi). 


5.2  Direct  Truncation  of  M0,  Mi 

Since  both  Mo  and  Mi  has  a  fast  decay  off  the  diagonal,  a  natural  approach 
to  speed  up  their  inversion  would  be  to  truncate  the  matrices  to  banded  form, 
with  a  small  bandwidth,  i.e.  setting  most  of  the  matrices  to  zero,  only  leaving  a 
few  non-zero  diagonals  untouched.  We  introduce  the  direct  truncation  operator 

A  =  trunc(A,p)  =>  Aij  = 

Then  we  let  Mj  =  trunc(Mj,p)  and  approximate 

M0  +  yMi  «  M0  +  yMu 

Since  I +a(Mo+yMj)  is  a  banded  matrix  with  bandwidth  p  the  cost  of  inverting 
it  is  0(Np2).  Hence,  we  could  allow  ourselves  to  take  p  ~  VN  to  have  a  cost 
of  0(N2)  as  we  needed. 

Unfortunately,  the  direct  truncation  method  yields  very  poor  results.  A 
typical  example  of  a  solution  with  directly  truncated  Mj  is  shown  in  Figure  4. 
The  sharp  transition  to  equilibrium  at  around  t  =  0.045  is  smeared  out  and 
the  equilibrium  is  not  reached  until  much  later.  A  more  quantitative  study  of 
the  errors  is  made  in  Figures  5  and  6.  Here  the  parameter  K  represents  the 
number  of  iterations  in  the  Newton  method  when  solving  the  nonlinear  systems 
of  equations  in  each  BDF  time  step.  The  error  reported  in  these,  and  subsequent, 
plots  is  the  maximum  componentwise  relative  error  between  the  solution  with 
and  without  truncation, 

\Xj  -Xj  I 

error  =  max  — - — . 

3  \xj\ 

The  final  time  is  selected  inside  the  transition  region  to  equilibirum,  T  =  0.045. 
One  can  see  that  the  methods  remains  stable  also  for  severe  truncations  (small 
p)  but  the  accuracy  is  not  good.  To  come  below  10%  error  more  than  15  of  the 
20  diagonals  must  be  kept  in  BDF1  with  At  =  10-4,  for  example.  For  p  <  10 
the  relative  error  is  close  to  one  for  all  method.  One  can  note  though  that  the 
error  is  reduced  if  more  iterations  (larger  K )  or  smaller  time  steps  At  are  used. 

The  underlying  reason  for  the  bad  results  with  direct  truncation  has  to  do 
with  the  conservation  properties  of  the  discrete  approximation,  which  we  dicsuss 
next. 


Aij ,  \i~j\<P, 

0,  \i  -  j\  >p. 


15 


Distribution  A:  Approved  for  public  release;  distribution  is  unlimited. 


No  truncation 


(b) 


Figure  4:  Solution  example  with  (a)  and  without  (b)  direct  trunction.  Param¬ 
eters  used  were  N  =  20,  At  =  10-5  and  p  =  7. 
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BDF1 


BDF2 


BDF3 


Figure  5:  Error  with  direct  truncation  as  a  function  of  the  number  of  diagonals  p 
and  Newton  iterations  K  for  the  BDF  methods.  Parameters  used  were  N  =  20, 
At  =  10-4  and  final  time  T  =  0.045. 
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BDF3 


diagonals 


Figure  6:  Error  with  direct  truncation  as  a  function  of  the  number  of  diagonals  p 
and  Newton  iterations  K  for  the  BDF  methods.  Parameters  used  were  N  =  20, 
At  =  10-5  and  final  time  T  =  0.045. 


18 


Distribution  A:  Approved  for  public  release;  distribution  is  unlimited. 


5.3  Discrete  Conservation 

As  mentioned  above,  in  the  exact  ODE  solution  the  total  number  of  electrons 
is  conserved.  In  the  discrete  case,  we  define 

qM  :=  \Tx^n\ 


Then,  is  constant  also  for  the  BDF  methods.  For  example,  in  BDF3, 

Q(n+l)  _  1T x(n+ 1) 

=  —lTX(-n)-—lTX<'n-1)  +  —lTX<'n-2)  +  —AtlTF(X<'n+1h 

11  11  11  11  v  ; 

—  n(n)  — —ni™-1) +  —(l(n -2) 

“lA  ii w  +iA 


which  is  a  difference  equation  with  the  solution  Q for  all  n  >  1  if 
initial  data  is  taken  such  that  Q^~2^  =  .  Morever,  exact  discrete 

conservation  holds  also  for  the  approximate  BDF  methods  where  F  is  linearized. 
This  follows  in  a  similar  way  upon  noting  that 


1TJ(X)  =  lT(M0+yM1+y2M2+M1XzT  +  2yM2XzT)  =  0, 


again  since  1 T  Mj  =  0.  Then,  again 

Q(n+1)  =  jTjj  _  Aa tJ{n)]Xin+1) 
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=  —  lTVn)-  —  lTX(n"1)  +  —  lTX(n"2)  +  —  AtlT[FXn))  -  J(n)X(n)] 

11  11  11  11  L  ;  J 


=  ^Q(n)-^Q("_1)  +  jiQ(n~2)- 


(Note  that  this  holds  true  also  if  multiple  iterations  are  used,  K  >  1,  since  it 
holds  for  each  iteration.) 

However,  if  we  directly  truncate  M0  and  M\  as  in  the  previous  section  they 
no  longer  have  exact  column  sum  zero,  1 T Mj  ^  0  and  then  neither  has  the 
approximate  Jacobian.  There  is  no  longer  exact  discrete  conservation,  which 
for  this  problem  leads  to  bad  performance  of  the  numerical  method. 


5.4  Weighted  Truncation  of  M0,  Mi 

Given  the  discussion  about  conservation  in  the  previous  section,  the  methods 
would  be  improved  if  the  truncation  is  done  such  that  the  column  sums  of  the 
matrices  are  unaffected.  This  can  for  instance  be  done  by  re-weighting  the 
off-diagonal  elements  as  follows 


A  =  trunc  w(A,p) 


Aij  —  \ 


A.. 

WjAij, 

0, 


i=j, 

0  <  \i-j |  <P, 
\i~j\  >  P, 
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where  the  weights  are  given  by 


' N  a.: 

'\i-j\<P  ^3 


An  -  v,; 

It  is  easy  to  check  that  1 T A  =  1T A.  We  use  this  truncation  on  the  full  matrix1 

M  =  truncw(Mo  +  yM1,p). 


Thus,  with  this  truncation  strategy  we  enforce  discrete  conservation  and  we  ex¬ 
pect  a  better  behavior  of  the  numerical  solution.  Indeed,  much  better  results  are 
obtained.  The  corresponding  result  with  weighted  truncation  for  the  example 
in  Figure  4  is  shown  in  Figure  7.  Here,  the  transition  to  equilibrium  is  captured 
without  problems.  The  quantitative  study  corresponding  to  Figures  5  and  6  is 
shown  in  Figures  8  and  9.  Small  errors  can  be  obtained  also  with  moderately 
sized  p.  For  example  in  BDF3  with  K  =  1  and  At  =  10-5  one  can  truncate  to 
just  3  diagonals  and  still  obtain  a  relative  error  close  to  1%.  On  the  other  hand, 
the  methods  become  unstable  (huge  relative  error)  for  small  enough  p  or  large 
enough  At.  This  seems  to  be  different  from  the  direct  truncation.  However,  the 
conclusions  made  for  direct  truncation  about  the  influence  of  K  and  At  hold 
true  also  for  weighted  truncation:  larger  K  and  smaller  At  give  smaller  error. 


6  Implicit  Methods  with  Precomputation 

As  was  shown  in  the  previous  section  the  main  cost  of  a  standard  implicit 
method  is  to  invert  the  matrix  I  +  aJ  in  each  iteration.  Via  the  Sherman- 
Morrison  formula  the  work  is  reduced  to  inverting  the  leading  order  matrix 
/  +  a(M0  +  yMi).  Instead  of  using  an  approximation  M0  +  yM\  to  lessen 
this  cost,  one  can  precompute  a  factorization  of  the  matrix.  The  usual  LU 
decomposition  does  not  help,  however,  since  the  parameter  y  will  be  different 
in  every  time  step.  Instead  we  use  the  generalized  Schur  factorization  which 
provides  a  simultaneous  decomposition  of  /  +  aMo  and  aMi . 

The  main  theoretical  result  behind  this  approach  is  the  generalized  Schur 
decomposition  theorem  saying  that  for  any  two  N  x  N  matrices  A  and  B  there 
are  two  orthogonal  matrices  Q  and  Z  such  that  both  QAZ  and  QBZ  are  upper 
triangular.  There  is  also  a  stable  algorithm  to  compute  these  matrices:  the  QZ- 
algorithm  [6] .  In  our  case,  we  find  the  orthogonal  matrices  related  to  I  + 
and  aMi  such  that 


Q(I  +  aMo)Z  =  mo,  olQM\Z  —  mi, 
where  both  m o  and  mi  are  upper  triangular.  Then,  to  solve 

(/  +  a(M0  +  yMi))x  =  6, 

1  Since  the  truncation  operation  is  not  linear  in  this  case,  the  result  is  different  from  trun¬ 
cating  Mq  and  Mi  individually. 
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No  truncation 


(a) 


(b) 


Figure  7:  Solution  example  with  (a)  and  without  (b)  weighted  trunction.  Pa¬ 
rameters  used  were  N  =  20,  At  =  10-5  and  p  =  7. 
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BDF3 


diagonals 


Figure  8:  Error  with  weighted  truncation  as  a  function  of  the  number  of  diag¬ 
onals  p  and  Newton  iterations  K  for  the  BDF  methods.  Parameters  used  were 
N  -  20,  At  =  10-4  and  final  time  T  =  0.045. 
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diagonals 


Figure  9:  Error  with  weighted  truncation  as  a  function  of  the  number  of  diag¬ 
onals  p  and  Newton  iterations  K  for  the  BDF  methods.  Parameters  used  were 
N  -  20,  At  =  10-5  and  final  time  T  =  0.045. 
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we  multiply  by  Q  and  Z  from  the  left  and  right,  respectively,  to  get 

Q(I  +  a(M0  +  yM1))ZZ*x  =  Qb  =>  (/  +  a(m0  +  ymi))(Z*x)  =  Q6. 

Hence,  x  is  obtained  from 

(/  +  a(mo  +  ymi))x  =  Qb,  x  =  Zx, 

where  the  system  matrix  /  +  a(mo  +  ?/mi)  is  now  upper  triangular.  The  cost  of 
computing  x  is  thus  given  by  the  cost  of  two  matrix  multiplications  by  orthog¬ 
onal  matrices  and  one  linear  solve  with  an  upper  triangular  matrix.  These  all 
have  costs  of  the  order  TV 2  operations.  The  QZ-algorithm  itself  costs  0(N3). 
The  total  complexity  of  this  approach  is  thus 

complexity  of  implicit  method  with  precomputation  =  0(N 3  +  TV2Asman). 

Thus  when  N  <  Asman  or  when  very  high  accuracy  is  needed  (small  At)  this 
is  as  fast  as  the  methods  based  on  approximate  Jacobians.  Since  there  is  no 
approximation  involved,  the  precomputation  method  is,  however,  more  robust 
and  accurate. 

6.1  Solution  Steps 

The  steps  to  evaluate  X  given  by 

(/  -  A tJ{X))X  =  6, 

in  the  BDF  methods  (29)  can  be  summarized  as  follows. 

•  Compute  the  generalized  Schur  factorization  of  J+aMo  and  olM\  to  obtain 
mo,  mi,  Q  and  Z. 

•  Let  y  =  Xjsr  and  compute 

m  =  3  y2m  +  M\X, 

according  to  (31).  (Here  m  is  the  last  column  of  M2.) 

•  Solve 

(/  +  a(m0  +  ymi))s  =  Qm,  (I  +  a(m0  +  ymi))w  =  Qb , 

•  Let  s  =  Zs  and  w  =  Zw. 

•  The  solution  is  given  by 


X  =  w- 


wN 

- - s. 

1  +  Sn 
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7  One  Step  Methods 


In  the  multistep  methods  discussed  above,  initial  data  must  be  given  for  several 
time  steps,  which  is  inconvenient  for  the  application  where  the  rate  equations 
are  used;  they  are  coupled  with  a  CFD  solver  where  only  one  data  point  is  given 
each  time  they  need  to  be  solved.  The  simplest  way  to  get  around  this  problem 
would  be  to  use  a  one-step  method,  which  only  requires  one  initial  data.  We 
thus  investigated  the  use  of  implicit  Runge-Kutta  (IRK)  methods,  focusing  on 
2-stage  methods. 

For  an  autonomous  ODE  X'  =  F(X)  the  update  step  from  X ^  — ►  X^n+1) 
in  a  2-stage  IRK  is  defined  by  the  parameters  and  b &  as  follows.  First,  find 

£1 ,  £2  such  that 

Cl  =  F  (x<">  +  Atlantic  +  a126])  ,  (33a) 

&  =  F  (xW  +  At[a21&  +  a22&])  ,  (33b) 

and,  second,  compute 

X(n+1)  =  X(n)  +  At(lnZl  +  &2&).  (34) 


Note  that  since  £i,£2  appear  in  both  the  left  and  right  hand  side  of  (33)  the 
methods  are  implicit.  As  in  the  multistep  case  we  therefore  need  to  solve  this 
equation  by  Newton’s  method  with  zero  as  initial  guess.  Upon  replacing  F  by 
its  linearized  version  as  in  (28)  we  get 

=  (F{XW)  +  MJ(XW)[a11$i+a12Z2]\ 

K&J  \F(XM)  +  AtJ(X^)[a21^  +  a22^]J  ’ 


which  leads  to  the  linear  system  for  £1  and  £2, 

(Zi\  =  (F(XW)\ 
{&)  \F(XW)J- 


(anJ(XW)  a12J(XW)\ 
\a21J(XW)  a22J(lW)l 


(35) 


This  system  must  be  solved  in  each  time  step.  The  solution  is  then  used  to 
update  X ^  as  in  (34). 

We  first  tested  the  classical  fourth  order  Gauss- Legendre  (GL4)  IRK  method, 
in  which  b\  =  b<i  =  1/2  and 

1  1  a/3  1  \/3 

an  -  a22  -  a12  -  -  -  a2l  m.-  +  — . 

The  results  were  not  satisfactory,  however.  The  GL4  method  is  very  accurate 
when  the  solution  is  on  the  slow  manifold,  but  it  is  bad  at  damping  the  fast, 
unresolved,  modes  compared  to  the  BDF  methods.  This  is  precisely  the  typical 
situation  at  the  initial  part  of  the  computation.  An  example  of  this  problem  is 
shown  in  Figure  10. 

The  underlying  reason  for  the  poor  performance  turned  out  the  be  that  GL4 
is  not  an  L-stable  method,  i.e.  its  stability  function  does  not  decay  to  zero  at 
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BDF  3  and  GL  BDF  3  and  GL 


(a)  At  =  1CT4  (b)  At  =  10“5 


Figure  10:  Initial  time  evolution  of  two  typical  population  densities  xi,  x 2  in 
an  N  =  20  system.  Initial  data  is  zero  for  these  densities,  i.e.  far  off  the  slow 
manifold.  The  solution  using  BDF3  (solid  lines)  is  satisfactory  for  both  the  large 
and  small  time- step;  it  reaches  the  slow  manifold  more  or  less  immediately,  as  it 
should.  (The  true  relaxation  time  is  ca  10-6.)  The  solution  using  GL4  (dashed 
lines),  on  the  other  hand,  takes  very  long  time  to  reach  the  slow  manifold,  even 
with  the  small  time-step.  The  solution  with  Radau  la  cannot  be  distinguished 
from  the  BDF3  solution. 


infinity.  Instead  we  used  the  L-stable  third  order  Radau  la  IRK  method,  for 
which  bi  =  1/4,  b2  =3/4  and 

1  1  5 

a  11  =  a21  =  a  12  =  —  jit  «22  =  — • 

In  this  case  we  obtained  very  good  results.  The  solution  plots  corresponding  to 
those  in  Figure  10  coincide  perfectly  with  the  ones  for  BDF3. 

7.1  Properties  of  the  Linear  System  of  Equations 

Using  Kronecker  notation  the  system  matrix  in  (35)  can  be  written 

I-AtA®J(xW),  A=(au  °12), 

\a21  &22 ) 

and  with  the  expresssions  in  (30)  and  (31)  we  get 

I  -  AtA  <g>  J(X{n))  =  1-  AtA  ®  M0  -  yAtA  <g>  Mx  -  AtA  (g)  (m/), 
where 

A  0  (mzT)  hr)  (°) T 

\a2imJ\0J  \a22mj  \zj 
This  shows  that  the  matrix  is  again  of  the  form 

I  -\-  ct(A4o  ~\~  yMi)  T 
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where 

Mo  =  A  0  Mo,  Ml o  =  A  0  Mi. 

and  1Z  is  now  of  rank  two  (instead  of  one).  To  treat  1Z  we  therefore  need 
to  apply  the  Sherman-Morrison  formula  twice,  iteratively.  The  same  kind  of 
precomputation  as  for  multistep  method  can  be  used  for  the  leading  part  I  + 
a(Mo  +  yM\). 

It  should  be  noted,  however,  that  the  matrix  form  is  less  favorable  when  more 
iterations  in  the  Newton  method  is  used,  i.e.  when  K  >  1.  Then  the  Jacobian 
of  F  is  evaluated  at  two  different  points,  leading  to  a  system  matrix  with  two 
parameters  yi  and  7/2  •  The  QZ  approach  cannot  be  used  on  this  system. 

Moreover,  the  strategy  to  approximate  the  Jacobian  using  truncation  gives  a 
different  sparsity  pattern.  Instead  of  a  banded  matrix,  we  get  a  2  x  2  block  struc¬ 
tured  matrix  where  each  block  is  banded.  Solving  this  fast  is  not  as  straightfor¬ 
ward.  In  the  tests  that  we  have  performed  we  have  also  observed  that,  accuracy 
wise,  the  one-step  Runge-Kutta  methods  are  in  general  more  sensitive  to  trun¬ 
cation  than  the  multistep  BDF  methods. 

8  Conclusions  and  Future  Work 

We  have  explored  implicit  methods  to  solve  the  stiff  rate  equations.  To  make 
the  methods  efficient  we  use  an  approximate  Jacobian  or  a  precomputed  factor¬ 
ization  of  the  Jacobian. 

The  Jacobian  is  strongly  diagonally  dominant  and  can  be  well  approximated 
by  a  banded  matrix.  Implicit  methods  then  becomes  relatively  inexpensive. 
However,  the  results  are  sensitive  to  precisely  how  the  approximation  is  done. 
We  have  found  that  it  is  of  vital  importance  to  approximate  the  Jacobian  with  a 
method  which  maintains  the  exact  discrete  conservation  of  the  solver.  By  using  a 
weighted  truncation  to  banded  form  we  achieve  this  and  obtain  accurate  results 
at  a  low  computational  cost. 

In  many  cases  the  generalized  Schur  factorization  of  the  Jacobian  can  be 
used  to  speed  up  the  time  stepping  of  the  ODEs.  This  is  a  more  robust  and 
accurate  approach  than  approximating  the  Jacobian.  It  is  preferable  if  the 
matrix  structure  of  the  Jacobian  allows  it,  and  if  the  accuracy  requirements 
in  the  ODE  solver  is  high  enough  so  that  the  prefactorization  cost  is  small 
compared  to  the  time  stepping  costs. 

One  step  methods  are  simpler  to  initialize  than  multistep  methods,  which 
is  an  advantage  when  the  ODEs  are  coupled  to  the  flow  solver.  Because  of  the 
extreme  stiffness  of  the  system  it  is  important  to  use  L-stable  one-step  methods, 
such  as  the  Radau  la  implicit  Runge-Kutta  method.  The  linear  system  of 
equations  that  must  be  solved  in  each  time  step  involves  the  same  Jacobian  as 
for  multistep  method,  but  it  has  in  general  a  blocked  matrix  structure,  which 
means  that  the  approximation  and  prefactorization  strategies  above  must  be 
adapted. 

The  main  directions  for  future  work  are: 
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•  Better  understanding  of  the  errors  in  the  weighted  truncation.  How  should 
the  number  of  diagonals  and  iterations  in  the  Newton  solver  be  chosen  to 
make  the  methods  robust  and  accurate?  Some  adaptive  approach  may 
be  necessary.  Here  one  could  get  ideas  from  dynamic  sparsing  techniques 
for  stiff  ODEs  [8]  or  methods  for  finding  sparse  preconditioners  in  PDE 
problems  [2].  In  these  methods  a  sparse  approximation  of  the  Jacobian 
is  found  dynamically  based  on  stability  criteria  for  the  ODE  or  spectral 
properties  of  the  matrices. 


•  Choice  of  implicit  Runge-Kutta  method.  There  are  many  options  for  L- 
stable  Runge-Kutta  methods  (e.g.  the  Rosenbrock  family).  These  should 
be  explored  and  tested  for  accuracy,  robustness  and  speed.  The  resulting 
matrix  structure  for  the  linear  system  of  equation  is  of  particular  concern. 
Diagonal  implicit  Runge-Kutta  methods  (DIRK)  could  be  an  attractive 
alternative,  since  the  matrix  structure  is  then  block  triangular. 

•  More  complex  and  larger  systems  of  rate  equations  with  multiple  species. 
This  leads  not  only  to  larger  ODE  systems,  but  also  to  different  matrix 
structures,  for  which  our  truncation  and  precomputation  strategies  must 
be  adapted. 

•  Dealing  with  extreme  stiffness.  For  larger  systems  (N  ~  100)  the  stiffness 
ratio  often  exceeds  1015  which  means  that  matrix  condition  numbers  are 
of  the  same  order.  Double  precision  is  then  not  sufficient  to  obtain  any 
useful  accuracy.  Some  alternative  modeling  may  be  necessary  for  these 
systems,  for  instance  approximating  them  by  differential  algebraic  equa¬ 
tions  (DAE). 

•  Splitting  methods.  The  complexity  could  potentially  also  be  reduced  by 
using  splitting  methods  for  Mo,  M\  and  M2.  This  would  mean  setting 
M(x)  =  yMi  +  y2 M2  and  do  a  time-stepping  of  the  type 


X(n)  =  x(n)  +  ^M(X(n))X(n), 

X(n)  =x(n)  +  A  tM0X{n), 


(n+1)  =  j^(n) 


^-M(X{n+1))X{n+1) . 


This  approach  should  be  investigated.  It  may  e.g.  be  revealing  to  study 
the  linear  problem  here,  where  M(x)  :=  M(x 0). 
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